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Abstract — Many robotic systems deal with uncertainty by 
performing a sequence of information gathering actions. In this 
work, we focus on the problem of efficiently constructing such 
a sequence by drawing an explicit connection to submodularity. 
Ideally, we would like a method that finds the optimal sequence 
of actions, taking the minimum amount of time while providing 
sufficient information. Finding this sequence, however, is gen- 
erally intractable. As a result, many well-established methods 
select actions greedily. Surprisingly, this often performs well 
even with only one step lookahead. Our work first explains 
this high performance - we note that a commonly used metric, 
reduction of Shannon entropy, is submodular under certain 
assumptions, rendering the greedy solution comparable to 
the optimal plan in the offline setting. Recently developed 
notions adaptive submodularity enable guarantees for a greedy 
algorithm in the online setting. We develop new methods within 
this framework, enabling us to provide guarantees compared 
to the optimal online policy, as well as exploit additional 
computational speedups. We demonstrate the effectiveness of 
these methods in simulation and on a robot. 

I. Introduction 

Dealing with uncertainty is a fundamental problem in 
robotics. Uncertainty accumulates from various sources such 
as noisy sensors, inaccurate models, and calibration er- 
ror. This is particularly problematic for fine manipulation 
tasks [1], such as precise grasping, button pushing, or insert- 
ing a key into a keyhole. Due to the high accuracy required 
for these tasks, small errors often lead to catastrophic failure. 

A standard approach is to perform a sequence of uncer- 
tainty reducing actions [2]-[6]. In fine manipulation, such 
actions are often encoded as guarded moves [7], where the 
hand moves along a path until it feels a touch. 

The optimal sequence of actions provides enough informa- 
tion to accomplish the task while optimizing a performance 
criterion like minimum energy or time. Computing the op- 
timal sequence can be formulated as a Partially Observable 
Markov Decision Process (POMDP) [8]. However, finding 
optimal solutions to POMDPs has been shown to be PSPACE 
complete [9] . Although several promising approximate meth- 
ods have been developed [10]-[13], they are still not well 
suited for many manipulation tasks, due to the continuous 
state and observation spaces. 

In this paper, we address the efficient automatic con- 
struction of such a touch-based localization sequence. Our 
primary insight is a connection to submodularity, allowing us 
to utilize efficient greedy algorithms and additional computa- 
tional speedups. Furthermore, we provide guarantees that the 
sequence we select is near-optimal. Our experiments confirm 
that our methods provide accurate localization in an efficient 




Fig. 1: We adaptively select a sequence of touch actions to 
reduce uncertainty. Here, we show actions selected by our 
Hypothesis Pruning method, enabling a successful grasp. 



manner. See Fig. [T] for an example sequence which enabled 
a successful grasp of a noisy door handle. 

Previous work on localization often utilizes online plan- 
ning within the POMDP framework, looking at locally 
reachable states during each decision step [14]. In general, 
these methods limit the search to a low horizon [15], often 
using the greedy strategy of selecting actions with the highest 
expected benefit in one step [2], [3], [5], [6]. This is generally 
out of necessity - computational time increases exponentially 
with the search depth. However, this simple greedy strategy 
often works surprisingly well for uncertainty reduction. 

One class of problems known to perform well with a 
greedy algorithm is submodular maximization. A metric is 
submodular if it exhibits the diminishing returns property, 
which we define rigorously in Section |III-A| A striking 
feature of submodular maximizations is that a simple greedy 
selection scheme is provably near-optimal. Furthermore, no 
polynomial time algorithm can guarantee optimality (unless 
P = NP) [16], [17]. 

One often used metric for uncertainty reduction is the ex- 
pected decrease in Shannon entropy [2]-[6], [18]-[20]. This 
is referred to as the information gain metric, and has been 
shown to be submodular under certain assumptions [21]. 



Not surprisingly, many robotic systems which perform well 
with a low horizon use this metric [2], [3], [5], [6], [18], 
though they do not make the connection with submodularity. 
We note that Hsiao mentions that touch localization could 
be formulated as a submodular maximization problem [15]. 
One of our contributions is identifying the assumptions 
required for greedy action selection to be near-optimal, 
their consequences, and applicability to different problems 



in Section |IV-A[ as well as providing experimental results. 

The guarantees for submodular maximization only hold 
in the non-adaptive setting. That is, if we were to select a 
sequence of actions offline, and perform the same sequence 
regardless of which observations we received online, greedy 
action selection would be near-optimal. Unfortunately, it has 
been shown that this can perform exponentially worse than 
a greedy adaptive algorithm for information gain [22], and 
thus we evaluate information gain online. 

Recent notions of adaptive submodularity [23] extend the 
guarantees of submodularity to the adaptive setting. The 
set of requirements for adaptive submodular functions are 
different, and information gain does not meet those criteria. 
With information gain as our inspiration, we design a similar 
metric which does. In addition to providing guarantees with 
respect to that metric, we can use a lazy-greedy algo- 
rithm [23], [24] which does not reevaluate every action at 
each step, enabling a computational speedup. 

In this work, we draw an explicit connection between 
touch based localization with a robotic end effector and 
submodularity. Understanding this connection enables us to 
design methods that fit into this framework and use a more 
efficient algorithm with near-optimal performance. 

Along those lines, we present two approaches for uncer- 
tainty reducing action selection. The first approach optimizes 
the information gain by fitting a Gaussian distribution to the 
remaining particles, and evaluating the expected entropy that 
results from each action. The second approach maximizes the 
expected number of hypotheses it will disprove. We show 
that our formulation of this metric is adaptive submodular. 
We apply both methods to selecting touch based sensing 
actions. We present results in Section |Vj comparing the 
accuracy and computation time of each metric. Finally, we 
show the applicability of these methods on a real robot. 

II. Related work 

Hsiao et al. [5], [15] formulate the problem of producing 
uncertainty reducing tactile actions with a POMDR Since it 
is intractable to solve fully, they perform a forward search 
online. Potential actions are specified by a small set (typically 
~5) of preset world-relative trajectories [5], making for a low 
branching factor. By searching at a limited horizon and per- 
forming aggressive pruning and clustering of observations, 
they circumvent computational issues. However, this can still 
take many seconds to compute each action and update, and 
there are no guarantees for optimality. In contrast, our work 
focuses on simpler actions and observations with a horizon 
of one, enabling us to consider significantly more actions 
(typically ~150) and achieve localization more quickly. 



Hebert et al. [6] independently approached the problem of 
action selection for touch based localization. They utilize a 
greedy information gain metric, similar to our own. However, 
they do not make a connection to submodularity, and provide 
no theoretical guarantees with their approach. Additionally, 
they model noise only in X,y,Z, while we use X,F,Z, 0. 
Furthermore, by using a particle based representation instead 
of a histogram (as in [6], [15]), we can model the underlying 
belief distribution more efficiently. 

Others forgo the ability to plan with the entire belief space 
altogether, projecting onto a low-dimensional space before 
generating a plan to the goal. During execution, this plan 
will likely fail, because the true state was not known. Erez 
and Smart use local controllers to adjust the trajectory [25]. 
Piatt et al. note when the belief space diverges from what 
the plan expected, and re-plan from the new belief [26]. 
They prove their approach will eventually converge to the 
true hypothesis. While these methods plan significantly faster 
due to their low-dimensional projection, they pick actions 
suboptimally. Furthermore, by ignoring part of the belief 
space, they sacrifice the ability to avoid potential failures. For 
example, these methods cannot guarantee that a trajectory 
will not collide and knock over an object, since the planner 
may ignore the part of the belief space where the object is 
actually located. 

Petrovskaya et al. [27] consider the problem of full 6D0F 
pose estimation of objects through tactile feedback. Their 
primary contribution is an algorithm capable of running in 
the full 6D0F space quickly. In their experiments, action 
selection was done randomly, as they do not attempt to 
select optimal actions. To achieve an error of ~5mm, they 
needed an average of 29 actions for objects with complicated 
meshes. While this does show that even random actions 
achieve localization eventually, we note that our methods 
take significantly fewer actions. 

In the DARPA Autonomous Robotic Manipulation Soft- 
ware (ARM-S) competition, teams were required to local- 
ize, grasp, and manipulate various objects within a time 
limit. Many teams first took uncertainty reducing actions 
before attempting to accomplish tasks [28]. Similar strategies 
were used to enable a robot to prepare a meal with a 
microwave [29], where touch-based actions are used prior 
to pushing buttons. To accomplish these tasks quickly, some 
of these works rely on hand- tuned motions and policies, 
specified for a particular object and environment. While this 
enables very fast localization with high accuracy, a sequence 
must be created manually for each task and environment. 
Furthermore, these sequences aren't entirely adaptive. 

Dogar and Srinivasa [30] use the natural interaction of 
an end effector and an object to handle uncertainty with 
a push-grasp. By utilizing offline simulation, they reduce 
the online problem to enclosing the object's uncertainty in 
a pre-computed capture region. Online, they simply plan a 
push-grasp which encloses the uncertainty inside the capture 
region. This work is complimentary to ours - the push-grasp 
works well on objects which slide easily, while we assume 
objects do not move. We believe each approach is applicable 



in different scenarios. 

Outside of robotics, many have addressed the problem of 
query selection for object identification. In the noise-free 
setting, a simple adaptive algorithm known as generalized 
binary search (GBS) [31] is provably near optimal. Inter- 
estingly, this algorithm selects queries identically to greedy 
information gain if there are only two outcomes [19]. The 
GBS method was extended to multiple outcomes, and shown 
to be adaptive submodular [23]. Our Hypothesis Pruning 
metric is similar to this formulation but with a more general 
observation space, allowing us to essentially model some 
amount of noise. 

Recently, there have been guarantees made for the case of 
noisy observations. For binary outcomes and independent, 
random noise, the GBS was extended to noisy generalized 
binary search [32]. For cases of persistent noise, where 
performing the same action results in the same noisy out- 
come, adaptive submodular formulations have been devel- 
oped based on eliminating noisy versions of each hypothe- 
sis [33], [34]. In all of these cases, the message is the same 
- with the right formulation, greedy selection performs well 
for uncertainty reduction. 

III. Problem Formulation 

We review the basic formulation for adaptive submodular 
maximization. For a more detailed explanation, see [23]. 

Let a possible object state be 0, called the realization. 
Let <I> be a random variable over all realizations. Thus, the 
probability of a certain state is given by = P[<I) = 0]. 
At each decision step, we select an action a from A, the 
set of all available actions, which incurs a cost c{a). Each 
action will result in some observation o from O, the set of all 
possible observations. We assume that given a realization 0, 
the outcome of an action a is deterministic. Let A C A be all 
the actions selected so far. During execution, we maintain the 
partial realization i//a, a sequence of observations received 
indexed by A. We call it a partial realization as it encodes 
how realizations G <I> agree with observations. 

For the case of tactile localization, is the object pose. 
A corresponds to all end-effector guarded move trajectories, 
which terminate when the hand touches an obstacle. O 
encompasses any possible observation, which is the set of 
all distances along any trajectory within which the guarded 
move may terminate. The partial realization essentially 
encodes the "belief state" used in POMDPs, which we denote 

by p{(l>\\ifA) = n^ = (l>WA]- 

Our goal is to find an adaptive policy for selecting actions 
based on observations so far. Formally, a policy ;r is a 
mapping from a partial realization 1//^ to an action item a. 
Let A(;r,0) be the set of actions selected by policy % if the 
true state is 0. We define two cost functions for a policy - 
the average cost and the worst case cost. These are: 

= E<i>[c(A(7r,cI>))] 
c^c = maxc(A(7r, 0)) 

Define some utility function f : 2^ x ^ M>o, which 
depends on actions selected and observations received. We 



would like to find a policy which that will reach some utility 
threshhold Q while minimizing one of our cost functions. 
Formally: 

min C|^^^,^^}(A(7r,<I>)) 

s.t.f{A{7r,^),^)>Qy(l> 

This is often referred to as the Minimum Cost Cover 
problem, where we achieve some coverage Q while mini- 
mizing the cost to do so. We can consider optimal policies 
TT*^^ and TT*^ for the above, optimized for their respective 
cost functions. Unfortunately, obtaining even approximate 
solutions is difficult [16], [23]. However, a simple greedy 
algorithm achieves near-optimal performance if our objective 
function / satisfies properties of adaptive submodularity and 
monotonicty. We now briefly review these properties. 

A. Submodularity 

First, let us consider the case when we do not condition on 
observations, optimizing an offline plan. We call a function 
/ submodular if whenever X CY C A, a e A\Y: 

Submodularity (diminishing returns): 

fix U {«}) - fix) > fiY U{a})- fiY) 

The marginal benefit of adding a to a smaller set X is at 
least as much as adding it to the superset Y. We also require 
monotonicty, or that adding more elements never hurts: 

Monotonicity (more never hurts): 

/(xuW)-/(x)>o 

The greedy algorithm maximizes ^^^^^^^^^-"^^ ^ the 
marginal utility per unit cost. As observations are not incor- 
porated, this corresponds to an offline plan. If submodularity 
and monotonicty are satisfied, the greedy algorithm will have 
a (1 +lnmax^/(a)) of optimal for integer valued / [17]. 

B. Adaptive Submodularity 

Now we consider the case where the policy adapts to new 
observations [23]. In this case, the expected marginal benefit 
of performing an action is: 

A{a\xifA) = E [/(A U {a}, CD) - /(A, <D) | i/^a] 

We call a function / adaptive submodular if whenever 
XCY CA,ae A\Y: 

Adaptive Submodularity: 

A{a\X) > A{a\Y) 

That is, the expected benefit of adding a to a. smaller set X 
is at least as much as adding it to the superset Y, for any set 
of observations received from actions Y\X. We also require 
strong adaptive monotonicity, or more items never hurts. For 
any a^X, and any possible outcome o, this requires: 

Strong Adaptive Monotonicity: 

E [/(X, <D) I xi/x] < E [f{X U {a},^) \ Wx.¥a = o] 

In this case, the greedy algorithm maximize ^^J^- This 
encodes an online policy, since at each i/ix incorporates the 




Fig. 2: We can think of tactile localization as a problem of set 
cover, which is adaptive submodular [23]. Each observation 
amounts to covering (green area) the hypotheses (black 
dots) which do not agree. Our objective is to maximize our 
coverage, or rule out as many hypotheses as possible. 



new observations. Surprisingly, we can bound the perfor- 
mance of the same algorithm with respect to both the optimal 
average case policy tt*^^ and optimal worst case policy tt*^. 
This has been shown to have a (l+ln(g)) approximation 
for n;^g, and a i^^H min^p^^) )) approximation for tt^^ ap- 
proximation for integer valued /, for self-certifying instances 
(see [23] for a more detailed explanation). 

IV. Application to Touch Localization 

We would like to appeal to the above algorithms and 
guarantees for touch localization, while still maintaining 
generality for different objects and motions. Given an object 
mesh, we model the random realization <I> as a set of sampled 
particles. We can think of each particle (j) e ^ representing 
some hypothesis of the true object pose. 

Each action aeA corresponds to an end-effector trajectory 
which stops when the object is touched. The cost c{a) is the 
time it would take to run this entire trajectory, plus some 
fixed amount for moving to the start pose. An observation 
6> G M corresponds to the time it takes for the end-effector 
to make contact with the object. We define as the time 
during trajectory a where contact first occurs if the true state 
were (j). See Figure [3] for an example. If the swept path of a 
does not contact object (j), then = oo. Note that this allows 
us to handle the observation corresponding to no contact. 

With this formulation, we first discuss some assumptions 
made about interactions with the world. We then present 
our different utility functions /, which capture the idea of 
reducing the uncertainty in <I>. In general, our objective will 
be to achieve a certain amount of uncertainty reduction while 
minimizing the time to do so. 

A. Submodularity Assumptions for Touch Localization 

In order to create objectives that fit into the framework of 
submodular maximization, we must make certain assump- 
tions. First, all actions must be available at every step. 
Intuitively, this makes sense as a necessity for diminishing 
returns - if actions are generated at each step, then a new 
action may simply be better than anything so far. In some 
sense, non-greedy methods which generate actions based on 
the current belief state are optimizing both the utility of 
the current action, and the potential of actions that could 
be generated in the next step. Instead, we generate a large, 
fixed set of information gathering trajectories at the start. 




Fig. 3: The observations for action a and realizations and 
(\)' . Each observation and a'^ corresponds to the time along 
the straight line trajectory when contact first occurs with 
the object. We use the difference of times \a^ — ci^i\ when 
measuring how far apart observations are. 



Second, we cannot alter the underlying realization 0, so 
actions are not allowed to change the state of the environment 
or objects. Therefore, we cannot intentionally reposition 
objects, or model noise caused by contact. 

When applied to object localization, this frameworks lends 
itself towards heavy objects that remain stationary when 
touched. For such problems, we believe having an efficient 
algorithm with guaranteed near-optimality outweighs these 
limitations. To alleviate some of these limitations, we hope 
to explore near-touch sensors in the future [35], [36]. 

B. Information Gain 

Following Krause and Guestrin [21], we define the in- 
formation gain as the reduction in Shannon entropy from 
performing actions. Let be the random variable over 
Then we have 

As they show, this function is monotone submodular if 
the observations are conditionally independent given the 
state 0. Thus, if we are evaluating this offline, we would 
be near-optimal compared to the optimal offline solution. 
However, this can actually perform exponentially worse than 
the online solution [22] . Therefore, we greedily select actions 
based on the marginal utility of a single action: 

^lG{a)=H{^)-¥.o[H{^\o)] 

We also need to define the probability of an observation. 
We consider a "blurred" measurement model where the 
probability of stopping at o conditioned on a realization is 
weighted based on the time difference between o and (the 
time of contact had been the true state), with a modelling 
the measurement noise: 

f \o — aA\ 
/?(aci> = ^?|0)ocexp( 1 

If we were selecting with a discrete measure of entropy, lo- 
cality of particles would not be taken into account. However, 
our particles actually represent samples from an underlying 
continuous distribution - we should prefer keeping two 
nearby particles as opposed to two faraway ones. Thus, 



instead of evaluating H{<$^) directly, we instead fit a Gaussian 
distribution and compute the entropy of that distribution. Let 
be the covariance over the weighted set of hypotheses, and 
N the number of parameters (typically x, y, z, 6). We use 
the approximated entropy: 

7/(*|o)«iln((2;r.f|E,|) 

After performing the selected action, we update the belief 
by reweighting hypotheses as described above. We repeat 
the action selection process, setting <l> to be the updated 
distribution, until we reach some desired entropy reduction. 

C. Hypothesis Pruning 

Intuitively, information gain is attempting to reduce uncer- 
tainty by shrinking the probability mass. Here, we formulate 
a method with the same underlying idea, which we show 
to be adaptive submodular and strongly adaptive monotone. 
We refer to this metric as Hypothesis Pruning, since the idea 
is to prune away hypotheses which do not agree with the 
observations. Golovin et al. describe the connection between 
this sort of query selection and adaptive submodularity by 
drawing a connection to Set Cover [23]. Our formulation is 
similar - see Fig. [2] for a visualization. 

As before, we consider a blurred measurement model. We 
consider two different observation models. In the first, we 
define a cutoff threshold dj. If a hypothesis is within the 
threshold, we keep it entirely. Otherwise, it is removed. We 
call this metric Hypothesis Pruning (HP). In the second, 
we downweight the hypotheses with a (non-normalized) 
Gaussian, and thus remove a portion of the hypothesis. We 
call this metric Weighted Hypothesis Pruning (WHP). The 
weighting functions are: 



1 if \o — a^ \ < dr 
else 

2^ 



wr^K)=exp(^-^ 

For a partial realization i/A, we take the product of weight- 
ings: 

Note that this can never increase the probability - for any 
actions and observations, Py/iij)) < p{(t>)- 

To calculate how much probability mass m remains with 
partial realization i/A, and after taking action a and receiving 
observation o, we use: 

^¥=11 Pwi^') 

We can now define the utility of a set of actions if (j) is the 
true state. Let A be the sequence of actions taken, and be 
the sequence of observations received. Then our utility is: 



/(A,(/.) = l-M, 



To calculate the expected marginal gain, we also need to 
define the probability of receiving any observation. We 
present it here, and show the derivation in the Appendi:xP. 
Intuitively, this will be proportional to how much probability 
mass agrees with the observation. Let O be the set of all 
possible observations: 



p{a^ = o\\i/) 



Finally, we define the marginal utility as the additional 
probability mass removed. For an observation o this is 
fy/,a,o = M\if — ^yf,a,o- Thus, the expected marginal gain is: 



-m 



In practice, we need to discretize the infinite observation 
set O. For an action a, we do so by considering observations 
exactly at each hypothesis, or O = {a^ : G <!>}. 

Thus, the greedy algorithm will maximize the expected 
probability mass removed at each step, per unit cost. After 
selecting an action and receiving an observation, the hy- 
potheses are downweighted or removed as described above, 
and action selection is iterated. We now present the main 
guarantee for this method: 

Theorem 1: Let our utility function for Hypotheses Prun- 
ing be / as defined above, utilizing either weighting function 
w^^ or w^^^ . Define 5 = min^^ P {(!>)• Let ;r*^^ and ;r*^ be 
the optimal policies minimizing the expected and worst-case 
number of items selected, respectively, to guarantee every 
realization is covered. The greedy policy n^reedy average 
costs at most ^In^^^ +1^ times the average cost of the 

best policy, and ^In + 1^ times the worst case cost of 
the best policy. More formally: 



In 



c.c{7t^''''') < cMc) (in + 1 

Proof: In order to prove Theorem [1} we will need 
to show that our objective is adaptive submodular, strongly 
adaptive monotone, and self-certifying. We show this in the 
Appendi:?P Our proof then follows directly from [23]. ■ 
In addition to being a logarithmic factor of optimal, we can 
utilize a lazy-greedy algorithm which does not reevaluate all 
actions at every step, enabling a computational speedup [23], 
[24]. 

V. Experiments 

We implement a greedy action selection scheme with 
each of the methods described above (IG, HP, WHP). In 
addition, we compare against two other schemes - random 
action selection, and a simple human-designed scheme which 
approaches the object orthogonally along the X, Y and Z 



^Located at http : //www . cs . emu . edu/ ~ s javdani/tou ciTJ 
|loc_submodular . html| 



axes. Each object pose (j) consist of a 4-tuple (x, 0) G M^, 
where are the coordinates of the object's center, and 

6 is the rotation about the z axis. 

We implement our algorithms using a 7-dof Barret arm 
with an attached 4-dof Barret hand. We localize two objects: 
a drill upright on a table, and a door. We define an initial 
sensed location G M^. To generate the initial <J>, we sample 
a Gaussian distribution A/^(/i,L), where {l = Xs, and L is 
the prior covariance of the sensor's noise. For simulation 
experiments, we also define the ground truth pose Xt G M^. 

For efficiency purposes, we also use a fixed number of 
particles |<l>| at all steps, and resample after each selection, 
adding small noise to the resampled set of particles. 

A. Action Generation 

We generate linear motions of the end effector, consisting 
of a starting pose and a movement vector. Each action starts 
outside of all hypotheses, and moves as far as necessary 
to contact every hypotheses along the path. Note that using 
straight-line trajectories is not a requirement for our algo- 
rithm. We generate actions via three main techniques. 

1) Sphere Sampling: Starting positions are generated by 
sampling a sphere around the sensed position Xs. For each 
starting position, the end-effector is oriented to face the 
object, and the movement direction set to X^. A random rota- 
tion is applied about the movement direction, and a random 
translation along the plane orthogonal to the movement. 

2) Normal Sampling: These actions are intended to have 
the hand's fingers contact the object orthogonally. First, 
we uniformly sample random contacts from the surface 
of the object. Then, for each fingertip, we align its pre- 
defined contact point and normal with the one randomly 
sampled from the object, randomly rotate the hand about 
the contact normal, and set the movement direction as the 
contact normal. 

3) Table Contacting: We generate random start points 
around the sensed position X^, and orient the end effector 
in the —z direction. These are intended to contact the table 
and reduce uncertainty in z. 

B. Simulation Experiments Setup 

We simulate an initial sensor error as Xt — Xs = 
(0.015,-0.015,-0.01,0.05) (in meters and radians). Our 
initial random realization <l> is sampled from N{jl^l.) with 
ll=Xs, and Z a diagonal matrix with Z^x = 0.03, l.yy = 0.03, 
Z^^ = 0.03, Zee = 0.1. We fix |<J>| = 1500 hypotheses. 

We then generate an identical action set A for each metric. 
The set consists of the 3 human designed trajectories, 30 
sphere sampled trajectories (Section |V-A.1[), 160 normal 



Drill Covariance Evolution 



Door Covariance Evolution 



trajectories (Section [V-A.2| ), and 10 table contact trajectories 
(Section [V-X3] ), giving |A| =203. 

We run 10 experiments using a different random seed for 
each, generating a different set A and <J>, but ensuring each 
method has the same A and initial <l> for a random seed. Each 
metric chooses a sequence of five actions, except the human 
designed sequence which consists of only three actions. 
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Fig. 4: Uncertainty after each action for drill and door 
experiments. The bars show the mean and 95% CI of the sum 
of eigenvalues of the covariance matrix over 10 experiments. 



IG 


HP 


WHP 


Time (s) 47.171 ±0.25 


8.41 ±0.58 


25.70 ±0.29 



TABLE I: Time to select one action for each metric, average 



and 95% CI over drill experiments in Section V-B 



C. Results 

We analyze the uncertainty reduction of each metric as 
the sum of eigenvalues of the covariance matrix, as in 
Fig. |4] All of the metrics were able to reduce the uncertainty 
significantly - confirming our speculation in Section [Il| that 
even random actions reduce uncertainty. However, as the 
uncertainty is reduced, the importance of action selection 
increases, as evidenced by the relatively poor performance 
of random selection for the later actions. 

We note that this measure of uncertainty is good for uni- 
modal distributions, as it assumes a single covariance matrix 
captures the uncertainty. Our Hypothesis Pruning (HP) and 
Weighted Hypothesis Pruning (WHP) method actually make 
no attempt to keep a unimodal distribution, as they naively 
prune hypotheses. On the other hand, our Information Gain 
(IG) method optimizes this measure directly, as it evaluates 
entropy by fitting a Gaussian. Surprisingly, even for this 
measure of uncertainty reduction, our HP and WHP meth- 
ods have comparable performance with IG. Additionally, 
we find that they perform significantly faster, due to both 
their inherent simplicity, and a speedup from a lazy-greedy 
algorithm [23], [24]. See Table |T1 

The human designed trajectories are effective for the drill, 
but perform poorly on the door. Unlike the drill, the door 
is not radially symmetric, and its flat surface and protruding 
handle offer geometric landmarks that our action selection 
metrics can exploit, making action selection more useful. 

For one drill experiment, we also display the hypothesis set 
after each action in Fig. [5j and the first 3 actions selected in 
Table |ll| Interestingly, each of our metrics selected a different 
sequence of actions, though they obtain similar performance. 

1) Robot Experiments: We implemented each of our 
methods (IG, HP, WHP) on a robot with a Barret arm and 
hand, and attempted to open a door. Xs is initialized with a 
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Fig. 5: The particle sets <l> from a single drill experiment after 
each update. Each plotted position corresponds to the x^y 
parameter of G <l>, rotated by O.Xs is the sensed position, 
Xt the true position, and the particles after update /. Arrow 
lengths are approximately the length of the drill base. The 
initial distribution was generated from a normal distribution 
with cj, = 0.02, Gy = 0.02, a, = 0.02, Oe = 0.2. 



vision system corrupted with an artificial error of 0.035m in 
the y direction. Our initial random realization <J> is sampled 
from A^(/i,Z) with fi = Xs, and Z a diagonal matrix with 



0.02, I.yy = 0.04, L,, = 0.02, Zee = 0.08. We fix 
2000 hypotheses. We initially generate 600 normal 



action trajectories (Section [V- A. 2| ), though after checking for 
kinematic feasibility, only about 70 remain. 

We utilize each of our uncertainty reducing methods prior 
to using an open-loop sequence to grasp the door handle. 
Once our algorithm selects the next action, we utilize a 
motion planner to transition to its start pose, and perform 
the straight line motion using a task space controller. We 
sense contact by thresholding the magnitude reported by a 
force torque sensor in the Barret hand. 

Without touch localization, the robot missed the door han- 
dle entirely. With any of our localization methods, the robot 
successfully opened the door, needing only two uncertainty 
reducing actions to do so. Selected actions are shown in 



Table III and full videos are provided onlin^. 



VI. Conclusion and Discussion 

In this work, we drew an explicit connection between 
submodularity and touch based localization. We presented 
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TABLE II: First 3 actions selected for each metric from the 
experiment in Fig. |5] The updates particles <I> are shown in 
yellow, with the previous in grey. 



three greedy methods of selecting uncertainty reducing touch 
actions. The first. Information Gain (IG), has been used ex- 
tensively for robot localization [2]-[6], [18], [20]. We noted 
the assumptions necessary for this method to be submodular, 
rendering the greedy algorithm near-optimal in the offline 
setting. We design our own methods. Hypothesis Pruning 
(HP) and Weighted Hypothesis Pruning (WHP), which we 
show are adaptive submodular. Thus, an efficient greedy al- 
gorithm is guaranteed to provide near-optimal performance in 
the online setting. In addition, these metrics are much faster, 
both due to their simplicity and a more efficient lazy-greedy 
algorithm [23], [24]. We demonstrate good performance for 
all our methods, both in simulation and on a robot. 

One limitation of our current work is the assumption that 



WHP 




TABLE III: Actions selected during robot experiment. Note 
that both IG and HP selected the same first action. All metrics 
lead to a successful grasp of the door handle. 



the hand and object are completely rigid, and contact is 
sensed with any force. Some of the actions selected may 
not be robust in the physical world. We hope to incorporate 
better action generation, as well as a more expressive hand 
and sensor model within our metrics to alleviate this. 

Though our hypothesis pruning methods satisfy conditions 
of adaptive submodularity, we note that Information Gain 
performs comparably well. One limitation of our current 
hypothesis pruning formulations is that they naively remove 
hypotheses, with no notion of the underlying continuous 
distribution. Furthermore, they simply reduce uncertainty 
until it falls below some threshold. In actuality, we may wish 
to drive our uncertainty to a particular distribution, dependent 
on the desired task. We hope to extend the ideas developed 
here to formulations which do. 
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VII. Appendix 



Here we present the theorems and proofs showing the Hypothesis Pruning metrics are near-optimal. To do so, we prove our 
metrics are adaptive submodular, strongly adaptive monotone, and self-certifying. We define a function for calculating the total 
probability mass removed from the original <J>: /(A,0) = 1 — Mj^^^j. This function can utilize either of the two reweighting 
functions w^^ or w^^^ defined in Section IV-C[ Our objective is a truncated version of this: f{A^(j)) = min {g,/(A, 0)}, 
where Q is the target value for how much probability mass we wish to remove. We assume that the set of all actions A is 
sufficient such that /(A, 0) = g, V0 G <l>. Note that adaptive monotone submodularity is preserved by truncation, so showing 
these properties for / implies them for /. 

First, we show how we derive p{a^ = olxif) = y — : 

p{a.p = o\\if) = £ p(o|0,V/)p((^)|v/) 
i|>e<t> 

We can think of our weighting function as an unnormalized version of p{o\(l)), and Pyf{(j)) as an unnormalized version of 
Thus, we define an unnormalized version p{a^ = o\\i/)\ 

p{a^ = o\\i/)= ^ Wo{a^)pyf{(l)) 



- m 



Finally, we need to normalize all observations, so we get: 

p{a^ = o\\if) = 

Now we can compute the expected marginal utility: 

^{a\^^fA) = E [f{A U {a}, <J>) - /(A, <J>) | Ya] 

= L llPi^\^^VA)p{o\\l/A)] [(l-m^^,a,o)-(l-M^^)] 

0e<l> \oeO ) 



L /^(^IV^a) [(1 - (1 -M, 



¥a) 



This shows the derivation of the marginal utility, as defined in Section |IV-C[ We now provide the proof for Theorem [T] 
by showing that this utility function is adaptive submodular, strongly adaptive monotone, and self-certifying: 

Lemma 1: Let A C A, which result in partial realizations i/Za- Our objective function defined above is strongly adaptive 
monotone. 

Proof: We need to show that for any action and observation, our objective function will not decrease in value. 
Intuitively, our objective is strongly adaptive monotone, since we only remove probability mass and never add hypotheses. 
More formally: 

E [/(A, CD) I vaa] < E [/(A U {a}, <D) | WA.Wa = o] 
^1-M^^ < l-M|^^uKo}} 
O 1 - Mx^^ < 1 - mx^^a.o 



As noted before, both of the weighting functions defined in Section [IV-C| never have a value greater than one. Thus each 
term in the sum from the LHS is smaller than the equivalent term in the RHS. 



Lemma 2: Let X CY CA, which result in partial realizations y/x '^Wy- Our objective function defined above is adaptive 
submodular. 

Proof: For the utility function / to be adaptive submodular, it is required that the following holds over expected 
marginal utilities: 

A{a\\i/Y) <A{a\\i/x) 

^ ; W^y-'^^yM ^ L : n -^¥xM 

We simplify notation a bit for the purposes of this proof. For a fixed partial realization \\fx and action a, let m^^^^a^o = ^o- 
Additionally, we note that for any action a and observation o, it is always true that mx^y^a.o < ^wxa,o when X CY. As noted 
before, the weighting functions can only remove probability mass. Let ko = m\i/x,a,o—^\ifY,a,o^ which represents the difference 
of probability mass remaining between partial realizations Yy ^^d Yx if we performed action a and received observation 
o. We note that ko > 0,V(9, which follows from the strong adaptive monotonicity, and ko < mx^^^a.o, which follows from 
^\\fY,a,o ^ 0- Rewriting the equation above: 

^ y!^lm^-k, [^^^ -"^o^ko] < ^ [M^x -rno] 

oeo l^o'eo ^o' V oeo l^o'eo ^o' 

^ I Y^MxifYmo-ml-^-moko-Mxifyko + moko-kl I ^ m^/ < I ^My^^m^-m^ I ^ mo'-ko' 
\oeo J \o'eo J \oeo J \o'eo J 

<^ ^ ^ MxifyMomo' — m^o^o' + moMoiko — Mx^yMo'ko + MoMoiko — Mo'k^ — Mx^^moMo' — M^^Moko' — m^o^o' + ^oK' 

oeOo'eO oeOo'eO 

^ H H ^Wy ip^o^o' - J^o'ko) + 2momo'ko -mo'kl< ^ ^ My^^ (m^m^/ - m^^^/) + m^ko' 
oeOo'eo oeOo'eo 

We also note that My/x —M^fy > max(^<5). That is, the total difference in probability mass is greater than or equal to the 

deo 

difference of probability mass remaining if we received any single observation, for any observation. 

O ^ ^ Imomo'ko-mo'kl < ^ ^ (^i/^x -My^Y){momo' -moko')^mlko' 

o^Oo'^O o^Oo'^O 

<^ ^ ^ 2momo'ko-mo'kl< ^ ^ max(^^) (m^m^/ -m^^^/) +m^^^/ 

o^Oo'^O o^Oo'^O 

<^ ^ ^ Imomo'ko-mo'kl < ^ ^ mRx{ko,koi){momoi -mokoi) ^m^koi 
oeOo'eo oeOo'eo 

In order to show the inequality for the sum, we will show it holds for any pair o^o^. First, if o = o\ than we have an 
equality and it holds trivially. For the case when o o\ we assume that ko > ko' WLOG, and show the inequality for the 
sum: 

ImoMof {ko -\-kof)— mo'k^ — mok^, < Imomo'ko — moko'ko — mofk^ + ^o^o' + m^fko 
<^ ImoMo'ko' — mok^f < m^ko' + m^fko — mokoko' 

^0 <ko' {mo — mo'Y' — {ko — ko' )ko' {mo — mo' ) + (^o — ko' )mo' {mo' — ko' ) 
^^<ko' {mo - mo'f - {ko ko')ko' {mo - m^/) + {ko ko')ko' {mo' ko') 

We split into 3 cases: 

A. ko' = 

This holds trivially, since the RHS is zero 

B. ko' ^O^mo < 2mo' — ko' 
Since ko' ^ 0, we can rewrite: 



0<{mo- mo'f - {ko - ko'){mo mo') + {ko - ko'){mo' ko') 
^0<-{ko- ko'){mo -mo')-\- {ko ko'){mo' ko') 
<^ {mo -mo') < {mo' - ko') 



Which follows from the assumption for this case. 



C. rrio > 2m^/ — 

We show this step by induction. Let rrio = 2m^/ — Z:^/ > 
Base Case: x = 0, which we showed in the previous case. 

Induction Assume this inequality holds for rrio = '^^o' —K'^^ • Let = m^, + 1. We now show that this holds for m^: 

0<{mo- m^tf - {ko - K'){rno - m^t) + (ko - K'){mo' - k^') 
<^ < [mo - m^i + 1)^ - (ko - ko')(rno - m^t + 1) + {K - ko'){rno' - ko') 
<^ < {rrio - t^o')^ - {K - ko'){mo - m^/) + {ko - ko'){mo' -ko')^ 2mo - 2m^/ + 1 + - ^ 
<^ < 2mo — 2mo' -\- 1 — ko -\- ko' 
^0<mo^l-ko 
^0< 1 

And thus, we have shown the inequality holds for any pair o^o\ ■ 

Finally, it is easy to show that the sum can be decomposed into pairs of o^o\ Therefore, we can see the inequality over 
the sum also holds. ■ 

Lemma 3: Let A C A, which result in partial realizations i//a. The utility function / defined above is self-certifying. 
Proof: An instance is self-certifying if whenever the maximum value is achieved for the utility function /, it is achieved 
for all realizations consistent with the observation. See [23] for a more rigorous definition. Golovin and Krause point out 
that any instance which only depends on the state of items in A is automatically self-certifying (Proposition 5.6 in [23].) 
That is the case here, since the objective function / = min{2, 1 — M,^^} only depends on the outcome of actions in A. 
Therefore, our instance is self-certifying. ■ 

As we have shown our objective is adaptive submodular, strongly adaptive monotone, and self-certifying. Theorem [T] 
follows from Theorems 5.8 and 5.9 from [23]. Following their notation, we note that r] = min^ p{^), since it is always true 
that f{S,(i>) > e-min^ implies = Q. 



by inductive hypothesis 
by assumption from case 



